22 research outputs found

    Filter techniques for region-based process discovery

    Get PDF
    The goal of process discovery is to learn a process model based on example behavior recorded in an event log. Region-based process discovery techniques are able to uncover complex process structures (e.g., milestones) and, at the same time, provide formal guarantees w.r.t. the model discovered. For example, it is possible to ensure that the discovered model is able to replay the event log and that there are bounds on the amount of additional behavior allowed by the model that is not present in the event log. Unfortunately, region-based discovery techniques cannot handle exceptional behavior. The presence of a few exceptional traces may result in an incomprehensible model concealing the dominant behavior observed. Hence, despite their promise, region-based approaches cannot be applied in everyday process mining practice. This paper addresses the problem by proposing two filtering techniques tailored towards ILP-based process discovery (an approach based on integer linear programming and language-based region theory). Both techniques help to produce models that are less over-fitting w.r.t. the event log and have been implemented in ProM. One of the techniques is also feasible in real-life settings as it, in most cases, reduces computation time compared to conventional region-based techniques. Additionally the technique is able to produce understandable process models that better capture the dominant behavior present in the event log. Keywords: Process mining, process discovery, integer linear programming, filterin

    Handling Big(ger) logs: Connecting ProM 6 to apache hadoop

    Get PDF
    Within process mining the main goal is to support the analysis, im- provement and apprehension of business processes. Numerous process mining techniques have been developed with that purpose. The majority of these tech- niques use conventional computation models and do not apply novel scalable and distributed techniques. In this paper we present an integrative framework connect- ing the process mining framework ProM with the distributed computing environ- ment Apache Hadoop. The integration allows for the execution of MapReduce jobs on any Apache Hadoop cluster enabling practitioners and researchers to ex- plore and develop scalable and distributed process mining approaches. Thus, the new approach enables the application of different process mining techniques to events logs of several hundreds of gigabytes

    Performance-preserving event log sampling for predictive monitoring

    Get PDF
    Predictive process monitoring is a subfield of process mining that aims to estimate case or event features for running process instances. Such predictions are of significant interest to the process stakeholders. However, most of the state-of-the-art methods for predictive monitoring require the training of complex machine learning models, which is often inefficient. Moreover, most of these methods require a hyper-parameter optimization that requires several repetitions of the training process which is not feasible in many real-life applications. In this paper, we propose an instance selection procedure that allows sampling training process instances for prediction models. We show that our instance selection procedure allows for a significant increase of training speed for next activity and remaining time prediction methods while maintaining reliable levels of prediction accuracy

    Process mining with streaming data

    Get PDF

    ILP-based process discovery using hybrid regions

    No full text
    The language-based theory of regions, stemming from the area of Petri net synthesis, forms a fundamental basis for Integer Linear Programming (ILP)-based process discovery. Based on example behavior in an event log, a process model is derived that aims to describe the observed behavior. Building on top of the existing ILP-formulation, we present a new ILP-based process discovery formulation that unifies two existing types of language-based regions and, additionally, we present a generalized ILP objective function that captures both region-types and helps us to find suitable process discovery results

    Event stream-based process discovery using abstract representations

    Get PDF
    The aim of process discovery, originating from the area of process mining, is to discover a process model based on business process execution data. A majority of process discovery techniques relies on an event log as an input. An event log is a static source of historical data capturing the execution of a business process. In this paper, we focus on process discovery relying on online streams of business process execution events. Learning process models from event streams poses both challenges and opportunities, i.e. we need to handle unlimited amounts of data using finite memory and, preferably, constant time. We propose a generic architecture that allows for adopting several classes of existing process discovery techniques in context of event streams. Moreover, we provide several instantiations of the architecture, accompanied by implementations in the process mining toolkit ProM (http://promtools.org). Using these instantiations, we evaluate several dimensions of stream-based process discovery. The evaluation shows that the proposed architecture allows us to lift process discovery to the streaming domain

    Filter techniques for region-based process discovery

    No full text
    The goal of process discovery is to learn a process model based on example behavior recorded in an event log. Region-based process discovery techniques are able to uncover complex process structures (e.g., milestones) and, at the same time, provide formal guarantees w.r.t. the model discovered. For example, it is possible to ensure that the discovered model is able to replay the event log and that there are bounds on the amount of additional behavior allowed by the model that is not present in the event log. Unfortunately, region-based discovery techniques cannot handle exceptional behavior. The presence of a few exceptional traces may result in an incomprehensible model concealing the dominant behavior observed. Hence, despite their promise, region-based approaches cannot be applied in everyday process mining practice. This paper addresses the problem by proposing two filtering techniques tailored towards ILP-based process discovery (an approach based on integer linear programming and language-based region theory). Both techniques help to produce models that are less over-fitting w.r.t. the event log and have been implemented in ProM. One of the techniques is also feasible in real-life settings as it, in most cases, reduces computation time compared to conventional region-based techniques. Additionally the technique is able to produce understandable process models that better capture the dominant behavior present in the event log. Keywords: Process mining, process discovery, integer linear programming, filterin

    Assessing process discovery scalability in data intensive environments

    No full text
    Tremendous developments in Information Technology (IT) have enabled us to store and process huge amounts of data at unprecedented rates. This phenomenon largely impacts business processes. The field of process discovery, originating from the area of process mining, is concerned with automatically discovering process models from event data related to the execution of business processes. In this paper, we assess the scalability of applying process discovery techniques in data intensive environments. We propose ways to compute the internal data abstractions used by the discovery techniques within the MapReduce framework. The combination of MapReduce and process discovery enables us to tackle much bigger event logs in less time. Our generic approach scales linearly in terms of the data size and the number of computational resources used, and thus, shows great potential for the adoption of process discovery in a Big Data context

    Handling big(ger) logs : connecting ProM 6 to Apache Hadoop

    No full text
    Within process mining the main goal is to support the analysis, improvement and apprehension of business processes. Numerous process mining techniques have been developed with that purpose. The majority of these techniques use conventional computation models and do not apply novel scalable and distributed techniques. In this paper we present an integrative framework connecting the process mining framework ProM with the distributed computing environment Apache Hadoop. The integration allows for the execution of MapReduce jobs on any Apache Hadoop cluster enabling practitioners and researchers to explore and develop scalable and distributed process mining approaches. Thus, the new approach enables the application of different process mining techniques to events logs of several hundreds of gigabytes. Keywords: Process mining, Big Data, scalability, distributed computing, ProM, Apache Hadoo
    corecore